Skip to main content

Linear and Quadratic Discriminant Analysis - LDA & QDA

Overview

Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) are two classic classifiers which differ mainly in the type of decision surface they employ:

  • LDA uses a linear decision surface.
  • QDA uses a quadratic decision surface.

These classifiers are favored for several reasons:

  • They offer closed-form solutions that are computationally efficient.
  • They inherently support multiclass classification.
  • They do not require hyperparameter tuning.

Decision Boundaries

  • LDA is limited to linear boundaries.
  • QDA, with its quadratic boundaries, provides more flexibility and can adapt to more complex patterns.

Dimensionality Reduction with LDA

LDA is also used for supervised dimensionality reduction by projecting input data onto a linear subspace. This subspace is defined by the directions that maximize the separation between different classes. Key points include:

  • The dimensionality reduction is substantial, reducing dimensions to less than the number of classes.
  • The method is most effective in a multiclass context.
  • The n_components parameter in LDA specifies the target dimensionality but does not affect the fitting and prediction process.

Mathematical Formulation

General Formulation for LDA and QDA

Both classifiers derive from probabilistic models with the assumption that the class conditional distributions are Gaussian:

P(xy=k)=1(2π)d/2Σk1/2exp(12(xμk)tΣk1(xμk))P(x | y=k) = \frac{1}{(2\pi)^{d/2} |\Sigma_k|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu_k)^t \Sigma_k^{-1} (x-\mu_k)\right)

where dd is the number of features, μk\mu_k is the mean of class kk, and Σk\Sigma_k is the covariance matrix for class kk.

Specifics for QDA

  • QDA allows each class to have its own covariance matrix Σk\Sigma_k, leading to a more flexible classifier that can model more complex boundaries.
  • If covariance matrices are diagonal, QDA simplifies to the Gaussian Naive Bayes classifier.

Specifics for LDA

  • LDA assumes a shared covariance matrix Σ\Sigma across all classes, simplifying the model to:
logP(y=kx)=12(xμk)tΣ1(xμk)+logP(y=k)+Cst.\log P(y=k | x) = -\frac{1}{2} (x-\mu_k)^t \Sigma^{-1} (x-\mu_k) + \log P(y = k) + Cst.
  • LDA effectively measures the Mahalanobis distance between class means and classifies based on the shortest distance.

Shrinkage and Covariance Estimation

Shrinkage is used to improve covariance matrix estimation, particularly useful when the number of features is much larger than the number of samples. Key implementations include:

  • Automatic shrinkage (shrinkage='auto'), which uses the Ledoit and Wolf lemma to determine the optimal shrinkage.
  • Manual setting of the shrinkage parameter, which can vary between complete reliance on the empirical covariance matrix (shrinkage=0) and the diagonal matrix of variances (shrinkage=1).

Estimation Algorithms

  • SVD Solver: Default for LDA, does not compute covariance directly and is suitable when the number of features is large.
  • LSQR Solver: Computes the coefficients by solving linear equations, supporting shrinkage and custom covariance estimators.
  • Eigen Solver: Optimizes the ratio of between-class scatter to within-class scatter and supports shrinkage.
# Import necessary classes from scikit-learn
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis

# Initialize Linear Discriminant Analysis with shrinkage
lda = LinearDiscriminantAnalysis(solver='lsqr', shrinkage='auto')
# Initialize Quadratic Discriminant Analysis
qda = QuadraticDiscriminantAnalysis()

# Assume X_train, y_train are training data and labels; X_test is test data
# Fitting the models
lda.fit(X_train, y_train)
qda.fit(X_train, y_train)

# Making predictions on test data
lda_predictions = lda.predict(X_test)
qda_predictions = qda.predict(X_test)

# Using LDA for dimensionality reduction
# Set the number of components for LDA
lda_for_reduction = LinearDiscriminantAnalysis(n_components=2)
lda_for_reduction.fit(X_train, y_train)
# Transform training data to a lower dimension
X_train_reduced = lda_for_reduction.transform(X_train)